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Abstract. We continue our study of open and closed languages. We 
investigate how the properties of being open and closed are preserved 
under concatenation. We investigate analogues, in formal languages, of 
the separation axioms in topological spaces; one of our main results is 
that there is a clopen partition separating two words if and only if the 
words commute. We show that we can decide in quadratic time if the 
language specified by a DFA is closed, but if the language is specified by 
an NFA, the problem is PSPACE-complete. 



1 Introduction 

In a previous paper [2] , we extended the work of Peleg [6] on closure operators 
of formal languages. In that paper, we observed that both positive and Kleene 
closure can be viewed as instances of a closure operator □; a language L is 
closed if L = L n . Similarly, a language is open if it is the complement of a 
closed language. A language is clopen if it is both closed and open. We proved 
many properties of open and closed languages, which share some but not all 
properties with analogous concepts in topological spaces. We proved two versions 
of Kuratowski's theorem for applying any number of the operators of closure and 
complement in any order, and we gave a complete characterization of all algebras 
resulting from this process. 

In this paper, we continue our study of open and closed languages. In Sec- 
tion [2] we investigate how the properties of being open and closed are preserved 
under concatenation. In Section^ we investigate analogues, in formal languages, 
of the separation axioms in topological spaces; one of our main results (Theo- 
rem [S]) is that there is a clopen partition separating two words if and only if the 
words commute. In Section 31 we show that we can decide in quadratic time if 
the language specified by a DFA is closed, but if the language is specified by an 
NFA, the problem is PSPACE-complete. Finally, in Section we mention some 
analogues of the compactness property. 



2 Closure operators and concatenation 

We recall two results from [5] : 



Proposition 1. Let L C S* . 

(a) L is positive- closed if and only if, for all u,v G L, we have uv G L. 

(b) L is positive-open if and only if for all u, v G S* such that uv G L, we have 
u G L or v G L. 

We note that the concatenation of two closed languages need not be closed, 
and that the concatenation of two open languages need not be open. Consider 
the languages L = {a} + and M — {b} + for a,b G S, which are both clopen 
(under positive closure). Then ab G LM but abab £ LM, so LM is not closed. 
Additionally, ab G LM, but neither a nor b is in LM, so LM is not open. 
However, we do have several results regarding cases when the concatenation of 
closed or open languages must be closed or open. 

Throughout this paper, the words closed, open, and clopen refer to their 
respective notions under positive closure, as it is the most general — most of our 
theorems will have obvious analogues in the Kleene closure case. However, the 
presence of e (or lack thereof) can be crucial when dealing with the concatenation 
of languages, so we will mention a few exceptional cases where the choice of 
positive or Kleene closure is important. 

Theorem 1. Let L,M C S* . 

(a) Suppose L is positive-closed, and let k be a positive integer. Then L k C L 
and L k is positive- closed. 

(b) Suppose L is Kleene- closed, and let k be a positive integer. Then L k = L. 

(c) Suppose L and M are positive-closed (respectively, Kleene- closed) languages 
satisfying the equation LM — ML. Then LM is positive- closed (respectively, 
Kleene- closed). 

(d) Suppose L and M are positive-closed (respectively, Kleene- closed) unary lan- 
guages (that is, L,M C {a}* for some symbol a.) Then LM is positive-closed 
(respectively, Kleene- closed). 

Proof, (a) Let u,v G L k . We show that L k is positive-closed by proving that uv G 
L k . Let u — itiit2 • • • itfe and v = v\V2 ■ ■ ■ Vk where i*i, . . . , Uk, v\, . . . , Vk G L. 
Then u G L since L is closed and thus L k C L. Similarly, uv\ G L k+1 C L, 
and thus uv = {uv\)v2 • • • Vk G L k . So L k is positive-closed. 

(b) If L is Kleene-closed, then L = Le k ~ 1 C L k . But L must also be positive- 
closed, so L k C L by part (a). Thus L k = L. 

(c) We examine the positive-closed case first. If it, v G LM, then let u = u\U2 
and v — v\V2 where u\, v\ G L and 1*2, V2 G M. Then U2V1 G LM since L and 
M commute. Thus uv = uiu 2 viv 2 G LLMM. But LL C L and MM C M 
by part (a), so LLMM C LM, implying that uv G LM and thus LM is 
positive-closed. For the Kleene-closed case, we simply note that if e G L and 
e G M then e G LM and the result follows. 

(d) This is a special case of part (c), since unary languages commute. 

■ 

Theorem 2. LetL,M C S* . Suppose L and M are positive-closed (respectively, 
Kleene-closed) and such that L U M is positive-closed. Then the following hold: 



(a) LM is positive-closed (respectively, Kleene- closed). 

(b) More generally, consider the free semigroup of languages {L, M} + generated 
by L and M. Let W £ {L,M} + . Then W is positive-closed (respectively, 
Kleene- closed) when considered as a language over S . 

Proof, (a) Let u, v € LM, then let u — u\u 2 and v — V1V2 where ui,Vi G L 
and u 2 ,v 2 G M. If LU M is positive-closed, then L U M = (L U M)+. But 
U2W1 6 (LU M ) + = L U M, so either G L or U2fi G M. If u 2 v 1 G L, 
then G L by closure of L, and thus u\u 2 v\v 2 G LM. Similarly, if 

M2W1 G M, then it2i>it>2 G M, and hence uiu 2 viv 2 G LM. So in all cases we 
have uv — uiu 2 viv 2 G LM and hence LM is closed. For the Kleene-closed 
case, we again simply note that if e G L and e G M, then e G LM. 

(b) The cases where W = L k or W — M k are proven by Theorem [TJ so we 
may assume that W contains at least one L and one M (when considered 
as a word in {L,M} + .) This implies that either LM or ML is a factor 
of W. Suppose without loss of generality that LM is a factor of W. Let 
W = W X W 2 ■ ■ ■ W k W k+1 ■■■W„ where W t G {L, M} for all i, and specifically 
Wk = L and W^+i = M. Now, to prove that W is closed, we let u,v £ W 
be words in Z 1 *. We show that uv G W. Let u = «!■•■ u„ and v = v\ ■ ■ ■ v n 
where Ui, G Wj for all i. Consider a; = Ufe+i • ■ • U n Vi ■ ■ ■ v^, a factor of uv. 
We see that x G (L U M) + . But since L U M is positive-closed, (L U M)+ = 
L U M, and hence either x G L or x G M. If cc G L, then it^x G L = 
by closure of L and thus uv — u± ■ ■ ■ Uk~i(ukx)vk+i ■ ■ ■ v n G W\ ■ ■ ■ W n = 
W. If x G M, then xvk+i G M = Wk+i by closure of M and thus uv = 
u\ ■ ■ ■ uk(xvk+i)vk+2 ■ ■ ■ Vn G Wi ■ ■ ■ W n — W. So we must have uv G W 
in either case, and thus W is closed. For the Kleene-closed case, we again 
simply note that if e G L and e G M, then e G W. 

■ 

Theorem 3. Let L and M be open. 

(a) Suppose e G L and e G M . Then LM is open. 

(b) Suppose e £ L and e ^ M. Then LM is open if and only if L — or M = 0. 
fcj LL is open if and only if e G L or L = . 

^ // neither L nor M is empty and e G L U M 6it< e ^ L (~l M, </ien we may or 
may not /iaue LM open, even in the unary case. 

Proof, (a) Let ab G LM where a G L and G M. Let a6 = uv for some words 
u and i>. To prove that LM is open, we must show that either u G LM or 
u G LM. We have two cases: either u is a prefix of a, or w is a suffix of b. 
If u is a prefix of a, let a = ux, so a& = uxb and hence u = xb. Since L 
is open, applying Proposition [1] (b) to a G L implies that either u G L or 
1 G L If it C L, then since e G M, u = ite G LM and we are done. If x G L, 
then v — xb G LM and we are also done. 

The case where v is a prefix of 6 is similar and relies on us having e G L. 
(b) If L = or M = 0, then LM = 0, which is open. Conversely, if e ^ L, e ^ M, 
and neither L = nor M = 0, then LM is non-empty but contains no words 
of length or 1 and is thus not open. 



(c) This follows immediately from parts (a) and (b). 

(d) If L = {e, a, aaa, aaaaa} and M — {a} (which are both easily verified to be 
open), then we have aaaaaa G LM, but aaa ^ LM, and thus LM is not open. 
On the other hand, if L = {e, a, aaa} and M = {a}, then LM — {a, aa, aaaa} 
which is clearly open. 

■ 

Theorem 4. Let L, M C E* both be clopen. 

(a) IfLUM = £*, then LM is clopen. 

(b) Suppose that L U M — E* and consider the free semigroup of languages 
{L,M}+ generated by L and M. Let W G {L,M} + . Then W is clopen if 
and only if W — or W contains at most one occurrence of a language 
which does not contain e. 

(c) The converses of the above statements are false; indeed, it is possible that 
LM is clopen, but LU M is not even positive-closed. 

Proof, (a) From Theorem [2] (a) we have that LM is closed, since S* is closed. 
To show that LM is open, let ab G LM where aGL and b £ M. Let ab = uv 
for some words u and v. To prove that LM is open, we must show that either 
u G LM or v G LM. There are two cases: either u is a prefix of a, or v is a 
suffix of b. 

Without loss of generality, we assume that u is a prefix of a and let a = ux, 
so ab = uxb and hence v = xb. Since L is open, applying Proposition [1] (b) 
toaeL implies that either u G L or x G L. If x G L, then v = xb G LM and 
we are done. Otherwise, we have x ^ L, implying u G L and x G M since 
LU M — U* . If e G M, u — ue G LM and we are done. Otherwise, we have 
e ^ AL, and thus e G L since L U M = U* . In this case, we note that xb G M 
since a; G M, b G M, and M is closed. Then exb = uE LA/. So in all cases, 
we have either u G LM or w G LM. Thus LM is open and hence is clopen. 
(b) Let W = W1W2 ■ • ■ W n where W l G {L, Af} for all i. W is closed by Theo- 
rem^ (b). If each Wi contains e, then W is open by repeated applications 
of Theorem [3] (a) and is thus clopen. 

If there exist i and j with i ^ j , e ^ Wi, and e ^ Wj, then contains no 
words of length 1, so either W ~ or W is not open (and thus not clopen). 

Finally, we deal with the case where there exists a unique i such that e g" Wj. 
Suppose, without loss of generality, that W 4 = M. Then W = L l - 1 ML n ~\ 
Since L U M is Kleene-closed, it must contain e, so e G L. Thus L k = L 
for all positive fc by Theorem [1] (b), so we must have W = M, W = LM, 
W = ML, or W = LML. In the first case, W — M is known to be clopen, 
and in the second and third cases, W is clopen by part (a). Thus we must 
only consider the case where W = LML. We know that LM is clopen by 
part (a). Furthermore, M C LM since e G L, so LM U L D M UL = S* and 
thus LM U L = £*, Thus we can apply part (a) on LM and L, proving that 
LML is clopen. 



(c) As a counterexample, we let L — {e} U {w £ {a, b}* : \w\ a < \w\b} and let 
M = {e} U {w £ {a, b}* : \w\ a > \w\b}, where by |io| c for a letter c, we mean 
the number of occurrences of c in w. As we proved in Example 1], L and 
M are both clopen. Furthermore, L and M both contain e, so LM is open 
by Theorem [3J 

Next, we show that LM is closed. Let u,v £ LM, then let u = U1U2 and 
v = V\V 2 , where Ui,i>i 6 L and u 2 ,v 2 £ M. We observe that \ui\ a < \ui\b 
and \v 2 \ a > \v 2 \b- We examine the factor u 2 v\ and consider two cases. If 
|w2fi|a > l u 2 w i|6: then \u 2 viv 2 \ a > \u 2 viv 2 \b and thus u 2 viv 2 € M. Since 
Ui G L, we must then have uv = uiu 2 v\v 2 s LM. Similarly, if |it2i>i| < 
|u2fi|b, then |uii/2Ui| a < |itiM2fi|b and thus u\u 2 v\ £ L. Since v 2 £ M, we 
must then have uv = u\u 2 v\v 2 £ LM. So in all cases, uv £ LM, and LM is 
closed. Hence LM is clopen. 

However, L U M is not closed, since we have b £ L C L U M and a £ M C 
L U M, but ba ^ LU M. 



3 Separation of words and languages 

Next, we discuss analogies of the separation axioms of topology in the realm of 
languages. Although languages do not form a topology under Kleene or positive 
closure, there are many interesting results describing when there exist open, 
closed, and clopen languages that separate given words or languages. In most of 
these theorems, we only consider words in S + , as e is always a trivial case. 

Lemma 1. Let w £ U + , and let L C S* be closed with w ^ L. Then there 
exists a finite open language M such that w £ M but M n L = 0. 

Proof. We simply take M = L~ fl{i£ S + : \x\ < \w\}. This is clearly finite, 
and is open by our characterization. ■ 

Theorem 5. Let u,v £ S + . 

(a) There exists an open language L with u £ L and v ^ L if and only if for all 
natural numbers k, we have b/d 1 . 

(b) Ifu^v, then either there exists an open language L with u £ L and v (fc L, 
or there exists an open language L with u £ L and v £ L (all words are 
distinguishable by open languages). 

Proof, (a) For the forward direction, we note that if u = v k for some positive k, 
then any open language containing u must contain v by Proposition [T] (b) . 
For the reverse direction, we apply Lemma[T]to u and which is closed. 

(b) If u ^ v, we can just pick w £ {u, v} such that \w\ in minimal. Then we 
take L — Pref({w}), which must contain one of u or v but not the other, 
and is open as we saw in [5J Example 2]. Here Pref(L) is the language of all 
prefixes of words in L. 



We now recall a basic result from combinatorics on words (see, e.g., [5])- 
Recall that a word w is primitive if it cannot be expressed in the form x k for a 
word x and an integer k > 2. 

Lemma 2. Let u,v £ S + . The following are equivalent: 

(1) uv = vu, that is, u and v commute. 

(2) There exists a word x and integers p > 1 and q > 1 such that u — x p and 
v = x q . 

(3) There exists a word y and integers p > 1 and q > 1 such that y = u p and 
V = v q . 

(4) u and v are each a power of the same primitive word. 

Let u, v G S + . Suppose there exists a clopen language L C S* with u G L 
and v L. We note that L~ is also clopen whenever L is, and we call the pair 
(L, L~) a clopen partition separating u and v. 

Theorem 6. Let u, v € S + . There exists a clopen partition separating u and v 
if and only if u and v do not commute. 

Proof. We handle the forward direction first. Suppose a clopen language L exists 
with u G L and v L. If u and v commute, then there exists a word x and 
integers p and q such that u — x p and v — x q . In particular, this implies that 
any open set containing u will also contain x, and any open set containing v will 
also contain x. Then we must have both x G L (since L is open and contains 
u) and x G L~ , since L~ is open and contains v. Thus we have a contradiction, 
and u and v must not commute. 

For the reverse direction, we proceed by induction on |u| + \v\. We will apply 
the induction hypothesis on words in various alphabets, so we make no assump- 
tion that 1 17 1 is constant. 

For our base case, suppose |u| + |t>| = 2. If u and v do not commute, then 
they must be distinct words of length 1, and thus the language {u} + is a clopen 
language separating u from v. 

Suppose, as a hypothesis, that for some k > 2, the result holds for all finite 
alphabets £ and for all u, v G S + such that 2 < |u| + < k. Now, given any S, 
let u, v G S + be such that u and v do not commute and |u| + |t>| = k + 1. Let E u 
and S v , respectively, be the symbols that occur one or more times in u and v. If 
S u r\S v = 0, then is a clopen language containing u but not v, and our result 
holds. If not, suppose a G S U C\S V . Let \ u — J^f and X v = ^f- be the respective 
relative frequencies of a in u and v. If \ u > X v , then {w G S* : \w\ a > X u \w\} is 
clopen (by [2] Example 1]) and contains u but not u, and we are done. Similarly, 
if X u < X v , then {w G S* : |w| a < A„|u;|} is a clopen language containing u but 
not v. Thus it remains to show that the result holds when A„ = X v . 

Assume X u = X v = X. If A = 1, then u = a 1 and v = a J for some positive 
integers i and j, and thus u and v commute, contradicting our original assump- 
tion. Hence wc must have < A < 1. Let n = gcd ( | ^^| M | ) = gc d{\v \ a ,\v \ ) be tne 



denominator of A when it is expressed in lowest terms. We must have n > 1 
since A is not an integer. 

Next, we consider a new alphabet A with \£\ n symbols, each corresponding 
to a word of length n in £* . We consider the bijective morphism (f> mapping words 
in A* to words in (£ n )* by replacing each symbol in A with its corresponding 
word in £ n . Since n divides both |w| and \v\, there must then exist unique words 
p, q G A* such that <j)(p) = u and <p(q) — v. 

Our plan is now to inductively create a clopen language L over A which 
contains p but not q, and then use this language to construct our clopen partition 
over £ separating u and v. We must check that p and q do not commute. If pq = 
qp then we would have uv — <p(p)4>{q) — <p{pq) — (f)(qp) — 4>{q)<p{p) — vu, since 4> 
is a morphism. This is impossible since uv ^ vu, so p and q do not commute. We 
also have n\p\ +n\q\ = \u\ + \v\. Since n > 1 implies \p\ + \q\ < \u\ + \v\ = k + 1, 
the induction hypothesis can be applied to p and q. Thus there exists a clopen 
language L C A* with p G L and q ^ L. 

We now construct our clopen partition over £ separating u and v. We in- 
troduce some notation to make this easier. As usual, define <j){L) = {w G 
£* : w = (f)(r) for some r G L}. Let A< — {w G £* : \w\ a < X\w\} and let 
A= = {w G £* : \w\ a = \\w\). Additionally, let A- = A< I) A=. It is easy to 
verify that A<, A-, and A = are all closed, and both A< and A- are open as 
well. Finally, we let M = ((f>(L)nA = )ljA < . Since p G L and q £ L, we must have 
u G 4>{L) and v £ 4>{L). Then since u and v are both contained in A = but not 
A < 1 we must have u G M and v ^ M. We will now finish the proof by showing 
that M is clopen. 

We first show that M is closed. Let x, y G M. We must show that xy G M. 
There are two cases to consider: 

Case (Al): x,y G {(j){L) n A=). We see that (f)(L)<f>(L) = <p(LL) C so 
is closed. Then since A = is closed, ((f>(L) n A = ) is the intersection of two 
closed languages, and hence closed. Thus xy G {<j>{L) fl A = ) C M. 

Case (A2): One or more of x or y is not in (</>(£) fl A = ). Without loss of 
generality, suppose x £ (<ft(L) n A = ). Then x G A < , so |x| a < A|x|. Furthermore, 
y G M C so |j/| a < A|y|. Adding these two inequalities yields \x\ a + \y\ a < 
X\x\ + X\y\, so \xy\ a < X\xy\ and thus xy G A K C M. 

Lastly, we show that M is open. Let z G M and suppose z = xy for some 
x, y G We show that x G M or y G M. Again, we have two cases to consider: 

Case (Bl): z G A K . Since A< is open, at least one of x or y is in A<. Since 
j4< C M, we are done. 

Case (B2): z G (</>(£) ^ ^ = )- If cither x or y is in A < , then we are done, so 
assume otherwise. Then x| > A|x| and \y\ a > X\y\. But \xy\ a — X\xy\, so we 
must have \x\ a = X\x\ and \y\ a = X\y\ and thus x,y G A = . Then A|x and A|y 
must be integers and hence n divides both |x| and \y\. Then there exist s, t G A* 
such that 0(s) = x and 0(t) = y. But since is a morphism, we must then have 
4>(st) — 4>(s)(j)(t) — xy = z. But z in <j){L), so st G L. Since L is open, we must 
then have either s G L or t G L. Thus we must have either x = (f>(s) G or 
y = 4>{t) G Then one of x or y is in 0(L) n A = C M. 



Thus M is both closed and open, and the result follows by induction. 



Corollary 1. Let u,v G S + . There exist non-intersecting finite open languages 
L and M with u G L and v £ M if and only if u and v do not commute. 

Proof. As in the proof of Theorem [5J we note that if u and v commute, then 
there is some x such that u = x p and v = x q , implying that every open language 
containing u or v must contain x, and thus there is no open language containing 
u but not v. If u and v do not commute, then by our theorem, let K be & clopen 
language containing u but not v. We then take L = {w G K : \w\ < \u\} and 
M = {w G K~ : \w\ < \v\}. These are open by our Proposition [1] (b) since K 
and K~ are both open. ■ 

We can also use Theorem [5] to extend the topological notion of connected 
components to the setting of formal languages. We say that words u, v G S + 
are disconnected if there exists a clopen partition separating u from v, and 
connected otherwise. We write u ~ v if u and v are connected, and note that ~ 
is an equivalence relation (indeed, this is the case when we consider the clopen 
partitions created by any closure operator; it need not be topological). Since 
Theorem [S] implies that u ~ v if and only if u — x p and v — x q for some integers 
p and q, it follows that each connected component of S + consists of a primitive 
word and all of its powers. Connected components of other languages will simply 
consist of collections of words sharing a common primitive root. 

It should be noted that connected components must be closed, but they need 
not be clopen. In fact, the only clopen components of S + are the languages {a} + 
for each a G S. 

As in [5], we say that a closure operator □ preserves openness if L a is open 
for all open sets L. We recall that positive closure preserves openness jH Theorem 
3], and use it to prove the following theorem, which indeed holds for all closure 
operators that preserve opennness. 

Theorem 7. If L, M <Z S* are disjoint and open, then L + and M + are disjoint. 

Proof. If LnM = 0, then M C L~. Then by isotonicity, A/+ C L~ + = L~ since 
L~ is closed. But then L G M" 1 . Applying isotonicity again yields L + C AP h . 
But M + is the closure of an open language and is thus clopen, so M" 1 is also 
clopen and thus AP ^ = A*P . Hence L + G AP , and it follows that L + and 
M + are disjoint. ■ 

Corollary 2. Let L, M C U* be closed and such that L U M = S* . Then 
L® u M m = £*. 

In our setting, it is not true that a single "point" x and a closed set S can 
be separated by two open sets. As a counterexample, consider x = ab and y = 
{aa, bb}*. Furthermore, it is not true that that arbitrary disjoint sets, even ones 
whose closures are disjoint, can be clopen separated. As an example, consider 
{ab}* and {aa, bb}*. 



4 Algorithms 

We now consider the computational complexity of determining if a given lan- 
guage L is closed or open. Of course, the answer depends on how L is represented. 

Theorem 8. Given an n-state DFA M — (Q, S, 5, qo, F) accepting the regular 
language L, we can determine in 0(n 2 ) time if L is closed or open. 

Proof. We prove the result when L is positive-closed. For Kleene-closed, we have 
the additional check qo G F. For the open case, we start with a DFA for L. 

We know from Proposition!]] (a) that L is closed if and only if, for all u,v G L 
we have uv G L. Given M, we create an NFA-e M' that accepts all words x £ L 
such that there exists a decomposition x = uv with u,v G L. Then L(M') is 
empty if and only if L is closed. 

Here is the construction of M': M' = (Q', S, 5', q' , F'), where Q' = Q U Q x 
Q, Qo = 90) F' — (Q — F) x F, and 5' is defined as follows: 



M' functions as follows: on input u, it simulates the computation of M. If 
and only if a final state is reached (and so u G L), M' has the option to use its 
e-transition to enter a state specified by two components, the second of which is 
qo. Now M' processes v, determining S(qo, uv) in its first component and S(qo, v) 
in the second. If uv ^ L, but v G L, then M' accepts. Thus M' accepts uv if and 
only if u, v G L and uv G" L. 

We now use the usual depth-first search technique to determine if L(M') is 
empty, which uses time proportional to the number of states and transitions of 
M' . Since M' has |Q||^| + \F\ + \Q\ 2 \Z\ transitions and |Q| + \Q\ 2 states, our 
depth- first search can be done in 0(n 2 ) time. ■ 

From Proposition [T] (a), we know that L is not closed if and only if there 
exists a word uv £ L such that n,iieL. We call such a word a counterexample. 

Corollary 3. If L is a regular language, accepted by a n-state DFA, that is not 
closed, then the smallest counterexample is of length < n + n — 1. 

This 0(n 2 ) upper bound on the length of the shortest counterexample is 
matched by a corresponding Q(n 2 ) lower bound: 

Theorem 9. There exists a class of DFA 's M n with 2n + 5 states, having the 
following property: the shortest word x ^ L(M n ) such that there exist u, v G 
L(M n ) with x — uv is of length n 2 + 2n + 2. 

Proof. It is conceptually easier to describe DFA's M' n = (Q,U,S,q ,F) that 
accepts the complement of L(M n ). In other words, we will show that the shortest 



S'(p,a) 
S'(\p, q],a) 



{6(p, a)} for p G Q, a G S; 
{\p,qo}}, UpeF; 

{[S(p, a), 8(q, a)]} for p, q G Q, a G S. 



word x E L(M' n ) such that there exist u, v $ L(M n ) with x = uv is of length 
n 2 + 2n + 2. The parts of the DFA M' n are as follows: 

Q = {qO,Ql, ■ ■ ■ ,q n ,r,p ,Pl, . . . ,p n ,S,d} 

F= {q ,q 1 ,...,q n ,po,p 1 ,...,Pn,s} 
and 5 is given in Table Q] 
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Table 1. Transition function S(q, a) of M' n 



The case n = 5 is illustrated in Figure [T] 



o 








Fig. 1. Example of DFA M n for n = 5. Unspecified transitions go to the dead 
state d. 

First, we observe that x — 10" _1 110" is accepted by M' n , but neither 

u = 10" _1 1 nor v = 10" is. Next, take any word x 1 accepted by M' n . If 

the acceptance path does not pass through r, then by examining the DFA we see 
that every prefix of x' is also accepted. Otherwise, the acceptance path passes 
through r. Again, we see that every prefix of x' is accepted, with the possible 
exception of the prefix ending at r. Thus either x' is of the form io m +"~ 1 110 fc 
for some i, k > 0, or x' is of the form io i ™+ n_1 110?(" +1 ) +n l for some i,j > 
0. In both cases the prefix ending at r is 10 m+n ~ 1 l, so in the first case, the 
corresponding suffix is 10 fe for some k > 0, and this suffix is accepted by M' n . In 
the latter case the corresponding suffix is l(P'(™+ 1 )+ n l. This is accepted unless 
j(n + 1) + n is of the form in + n — 1. If in + n — 1 = j(n + 1) + n, then by 



taking both sides modulo n, we see that j = — 1 (mod n). Thus j > n — 1. Thus 
|x'| >l + n- l + l + l + (n- l)(n + 1) + n + 1 = n 2 + 2n + 2. ■ 

We now turn to the case where M is represented as an NFA or regular 
expression. We need the following classical lemma pQ: 

Lemma 3. Let T be a one-tape deterministic Turing machine and p(n) a poly- 
nomial such that T never uses more than p{\x\) space on input x. Then there 
is a finite alphabet A and a polynomial q(n) such that we can construct a reg- 
ular expression r x in q{\x\) steps, such that L(r x ) = A* if T doesn't accept x, 
and L(r x ) = A* \ {w} for some nonempty w (depending on x) otherwise. Simi- 
larly, we can construct an NFA M x in q(\x\) steps, such that L(M X ) — A* ifT 
doesn't accept x, and L(M X ) = A* \ {w} for some nonempty w (depending on 
x ) otherwise. 

For the following theorem, we actually require the word w exhibited in the 
theorem above to have length > 2. However, this can easily be accomplished 
via a trivial modification of the proof given in [T] , since the word w encodes a 
configuration of the Turing machine T. 

Theorem 10. The following problem is F 'SPACE- complete: given an NFA M, 
decide if L(M) is closed. 

Proof. First, we observe that the problem is in PSPACE. We give a nondeter- 
ministic polynomial-space algorithm to decide if L(M) is not closed, and use 
Savitch's theorem to conclude the result. 

If M has n states, then there is an equivalent DFA M' with N < 2™ states. 
From Corollary [3] we know that if L = L(M) = L(M') is not closed, then there 
exist words u, v with u,v € L but uv g" L, and \uv\ < N 2 + N - I = 2 2n + 2" - 1. 
We now guess u, processing it symbol-by-symbol, arriving in a set of states S of 
M. Next, we guess v, processing it symbol-by-symbol starting from both q and 
S, respectively and ending in sets of states T and U. If U contains a state of F 
and T does not, then we have found n,ueL such that uv £ L. While we guess 
u and v, we count the number of symbols guessed, and reject if that number is 
greater than 2 2n + 2 n - 1. 

Now we show the problem is PSPACE-hard. To do so, we observe that A* is 
closed, but A* \ {w} for w with \w\ > 2 is not. Thus, using our modification of 
Lemma [31 we could use an algorithm solving the problem of whether a language 
is closed to solve decidability for polynomial-space bounded Turing machines. I 

We note that it is possible for an n-state NFA M to have the property that 
L(M) is not closed, but the minimal-length example of a word uv with u,v G L 
but uv ^ L is exponentially long. Such an example is given in [4], where it is 
shown that for some constant c, there exist NFA's with of n states such that the 
smallest word not accepted is of length > 2 C ™. 

We note that the problem of deciding, for a given NFA M, whether L(M) is 
open is also PSPACE-complete. The proof is similar to that of Theorem ITOl 



5 Compactness 



A closure operator is an algebraic closure operator (also called a finitary closure 
operator) if X a = [j{Y n : Y C X and Y finite}. It is easy to show that both 
Kleene and positive closures are algebraic closure operators. Closed languages 
form a complete lattice under the partial ordering given by set inclusion. The 
meet and join (infimum and supremum) operators are the following: 

(B) LAR = L(1R. 
(A) XV R= (LUR) a . 

A language X is a compact element of this lattice if and only if whenever {AX : i G 
1} is a family of languages for some arbitrary index set X with X C \J ieI {AX}, 
there is some finite J C X such that X C \J ie j {Mi}. It turns out that our 
lattice is compactly generated, meaning that every language is the supremum of 
compact elements. It is therefore an algebraic lattice, and the compact elements 
are simply closures of finite languages (as is true for the inclusion lattice of 
any algebraic closure operator; see [3]). Thus we will say a language is compact 
whenever it is the closure of a finite language. What follows are some results 
about compact languages. 

Theorem 11. Let L,M C S* be compact. Then (X U M) a is compact. 
Theorem 12. Let X C S* be compact and let M C S* be finite. Then 

(1) X U M is compact if and only if it is closed. 

(2) L\M is compact if and only if it is closed. 

Theorem 13. Let X be finite and open. Then L~ is compact. 
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